Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTEP: Recording exceptions as log based events #4333

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Dec 10, 2024

Related to open-telemetry/semantic-conventions#1536

Changes

Recording exceptions as span events is problematic since it

  • ties recording exceptions to tracing/sampling
  • duplicates exceptions recorded by instrumented libraries on logs
  • does not leverage log features such typical log filtering based on severity

This OTEP provides guidance on how to record exceptions using OpenTelemetry logs focusing on minimizing duplication and providing context to reduce the noise.

If accepted, the follow-up spec changes are expected to replace existing (stable) documents:


@lmolkova lmolkova changed the title OTEP: Recording exceptions and errors with OpenTelemetry OTEP: Recording exceptions as log based events Dec 10, 2024

## Motivation

OTel recommends recording exceptions using span events available through Trace API. Outside of OTel world, exceptions are usually recorded by user apps and libraries using logging libraries.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTel recommends recording exceptions using span events

Do we actually make such a recommendation?

https://github.com/open-telemetry/semantic-conventions/tree/main/docs/exceptions Lists conventions on how to store Exceptions in Spans and Logs, but I don't see any recommendation being made there.... Is there another place I am missing?

Copy link
Contributor Author

@lmolkova lmolkova Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, I'll change the wording.

what I mean is that the only documented (in the spec) way is https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/exceptions.md. If we combine it with no-user-facing-log-api we had until recently, it results in span events being the only guaranteed way to record exceptions for instrumentation library that doesn't want to depend on a 3rd party logging facade. Not a problem in some languages, but a problem in others.


OTel recommends recording exceptions using span events available through Trace API. Outside of OTel world, exceptions are usually recorded by user apps and libraries using logging libraries.

Log-based exception events have the following advantages over span events:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with all of this. Minor point worth adding - one advantage of using SpanEvents for Exception is that they automatically get sampled along with the corresponding Spans - It is possible to achieve similar effects with Logs, but users have to do extra work to ensure Logs are sampled similar to Spans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it's a disadvantage since users don't have a choice. And a similar effect can be achieved with a configuration option and logger.IsEnabled(..., context). I.e. I feel this is not a fundament problem - we can make log sampling almost as easy as with span events.

- they can be recorded for operations that don't have any tracing instrumentation
- they can be sampled along with or separately from spans
- they can have different severity levels to reflect how critical the exception is
- they are already reported natively by many frameworks and libraries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more advantage: SpanEvents has the potential of being affected by the Max_SpanEvents_Per_Span limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why is it an advantage? I think it's a safe belt preventing buffering unbound amount of events on spans, with log based events we have batching processor for it and also can do more interesting things like log throttling in the pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to say that, when using Spans for Exceptions, there is a chance that my exception is the one that gets dropped due to span events already full by some other stuff. Yes limit is for own safety, but if the exception gets lost, then that is also bad...

@pellared
Copy link
Member

pellared commented Dec 10, 2024

I think this is a related issue:

// we're rethrowing an exception here since the underlying
// platform code may or may not record exception logs depending on JRE,
// configuration, and other implementation details
logger.eventBuilder("exception")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"exception" here would be the EventName, right? is that the intent? Shouldn't it be more fully qualifies than just "exception"? I was thinking something like Namespace.Networking.SocketChannel.Write.Exception ?

Copy link
Member

@pellared pellared Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we want to model it as an OTel Event? How would we define the event name concrete exception types? Maybe it should be just a Log Record (see #4234)

"exception" is almost as abstract as an "event".
Different instrumentation libraries may want to add additional contextual attributes related to the exception. There may be also some use cases that one would like to set a complex body.

On the other hand, we can say that the OTel Events have "minimal" requirements regarding its structure and instrumentations may be able to add any additional data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points!

I agree that exception event name is not particularly useful. We don't have to record exceptions as events at all. I changed this otep to default to logs. There is also recommendation to define custom error events in the text.

oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
oteps/4333-recording-exceptions-on-logs.md Outdated Show resolved Hide resolved
@tedsuo tedsuo added the OTEP OpenTelemetry Enhancement Proposal (OTEP) label Dec 12, 2024
@lmolkova lmolkova force-pushed the exceptions-on-logs-otep branch from 920f366 to 569313a Compare December 16, 2024 01:12
@lmolkova lmolkova force-pushed the exceptions-on-logs-otep branch from b06a09f to 76c7d85 Compare December 17, 2024 17:30

1. OpenTelemetry should provide configuration options and APIs allowing (but not limited) to:

- Record unhandled exceptions only (the default documented in this guidance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OTEP OpenTelemetry Enhancement Proposal (OTEP)
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

6 participants